Search Autocomplete System Design

Requirements
Capacity Estimation
High-Level Design (HLD)
Deep Dives
Database Schema
API Design
Scalability & Optimization

1. Requirements (~5 minutes)

Functional Requirements

✅ Users should get suggestions as they type (after 2-3 characters)
✅ System should return top 10 most relevant suggestions
✅ Suggestions should be ranked by popularity (search frequency)
✅ Support for personalized suggestions based on user history
✅ Support for multi-language queries
✅ Handle typos and spelling corrections (fuzzy matching)
✅ Support trending/real-time queries (breaking news, events)

Non-functional Requirements

✅ Ultra-low latency: < 100ms response time (P99)
✅ High availability: 99.99% uptime
✅ Scalable: Support 10M+ queries per second
✅ Eventual consistency: Acceptable for popularity updates
✅ Real-time updates: New trending queries appear within minutes
✅ Support 5B+ unique queries in the database

Out of Scope

❌ Full search results (only suggestions)
❌ Image/video search
❌ Voice search

Capacity Estimation

Assumptions:

Daily Active Users (DAU): 500M
Average searches per user: 10/day
Average keystrokes per search: 10 characters
Suggestions requested after 2 characters
Total unique queries: 5B

QPS:

Daily Searches = 500M users × 10 searches = 5B searches/day
Autocomplete Requests = 5B × 8 keystrokes (after 2nd char) = 40B requests/day
QPS = 40B / 86400s = ~463K requests/sec
Peak QPS (3x average) = ~1.4M requests/sec

Storage:

Query Storage:
- Average query length: 30 characters
- Unique queries: 5B
- Storage = 5B × 30 bytes = 150 GB

With Metadata (frequency, timestamp):
- Per query: 30 bytes (query) + 8 bytes (count) + 8 bytes (timestamp) = 46 bytes
- Total = 5B × 46 bytes = 230 GB

Trie Data Structure (in-memory):
- Trie nodes: ~26 children per node (alphabet)
- Estimated nodes: ~50M nodes
- Per node: 26 pointers (8 bytes each) + data (20 bytes) = 228 bytes
- Total = 50M × 228 bytes = ~11.4 GB per Trie

Bandwidth:

Request Size = 50 bytes (prefix + metadata)
Response Size = 10 suggestions × 50 bytes = 500 bytes
Peak Bandwidth In = 1.4M × 50 bytes = 70 MB/s
Peak Bandwidth Out = 1.4M × 500 bytes = 700 MB/s

2. API Interface (~2 minutes)

REST API

GET /api/v1/autocomplete?q={prefix}&limit=10&lang=en&user_id={userId}

Request:
GET /api/v1/autocomplete?q=face&limit=10&lang=en

Response:
{
  "suggestions": [
    {
      "text": "facebook",
      "frequency": 15000000,
      "type": "website"
    },
    {
      "text": "facebook login",
      "frequency": 8500000,
      "type": "query"
    },
    {
      "text": "face masks",
      "frequency": 5200000,
      "type": "product"
    },
    {
      "text": "facebook messenger",
      "frequency": 4100000,
      "type": "website"
    }
  ],
  "latency_ms": 45
}

Query Parameters

q: Search prefix (required)
limit: Number of suggestions (default: 10, max: 20)
lang: Language code (default: en)
user_id: For personalized results (optional)
location: For geo-specific suggestions (optional)

3. High-Level Design (~10-15 minutes)

Architecture Components

Client Layer:

Web browsers, mobile apps
Debouncing: Wait 150ms after last keystroke before API call
Client-side caching: Cache last 50 queries

Edge Layer:

CDN/Edge servers at 200+ locations
Cache popular prefixes (hit rate: ~40%)

API Layer:

API Gateway: Rate limiting, authentication
Load Balancer: Distribute to autocomplete servers
Autocomplete Service: Core suggestion logic

Data Layer:

Trie Servers: In-memory Trie data structure
Redis Cluster: Cache top suggestions per prefix
Cassandra: Historical query data, frequency counts
Kafka: Real-time query stream for analytics

Analytics Layer:

Data Aggregator: Process query logs, update frequencies
Trending Service: Detect real-time trends
ML Service: Personalization models

4. Architecture Diagram

4. Deep Dives (~10 minutes)

4.1 Trie Data Structure (Core Algorithm)

Why Trie?

Efficient prefix matching: O(len(prefix))
Space-efficient for common prefixes
Easy to traverse for top-K suggestions

Trie Node Structure:

TrieNode {
    children: Map<char, TrieNode>  // 26 letters
    isEndOfWord: boolean
    frequency: long                 // How many times searched
    topSuggestions: List<String>    // Pre-computed top 10
}

Example Trie for ["face", "facebook", "fact"]:

        root
         |
         f
         |
         a
         |
         c
        / \
       e   t
       |   |
       *   *
       |
    book
       |
       *

* = isEndOfWord = true

Lookup Process:

Start at root
Traverse prefix: 'f' → 'a' → 'c' → 'e'
At 'e' node, get pre-computed topSuggestions
Return top 10

Time Complexity:

Lookup: O(len(prefix)) + O(1) = O(p)
Space: O(total_chars_in_all_queries)

4.2 Ranking Algorithm

Multi-factor Scoring:

Score = w1 × FrequencyScore +
        w2 × RecencyScore +
        w3 × PersonalizationScore +
        w4 × TrendingScore

Where:
- FrequencyScore = log(query_count) / log(max_count)
- RecencyScore = e^(-decay_factor × days_since_last_search)
- PersonalizationScore = user_clicked_before ? 1.5 : 0
- TrendingScore = spike_in_last_hour ? 2.0 : 0

Weights: w1=0.5, w2=0.2, w3=0.2, w4=0.1

Example:

Query: "facebook"
- Frequency: 15M searches → FrequencyScore = 0.95
- Last searched: Today → RecencyScore = 1.0
- User clicked before: Yes → PersonalizationScore = 1.5
- Not trending now → TrendingScore = 0

Final Score = 0.5×0.95 + 0.2×1.0 + 0.2×1.5 + 0.1×0 = 1.075

4.3 Trie Partitioning (Horizontal Scaling)

Problem: Single Trie can't fit all 5B queries in memory

Solution: Partition by first letter(s)

Server 1: A-D queries (Trie 1)
Server 2: E-H queries (Trie 2)
Server 3: I-L queries (Trie 3)
Server 4: M-P queries (Trie 4)
Server 5: Q-T queries (Trie 5)
Server 6: U-Z queries (Trie 6)

Routing Logic:

prefix = "face"
first_char = prefix[0] = 'f'
server = route_to_server(first_char)  // Server 2 (E-H)

Benefits:

Each server handles ~11 GB Trie (manageable)
Parallel processing
Independent scaling

Replication:

Each partition replicated 3x for availability
Master-slave setup: Master handles writes, slaves for reads

4.4 Handling Updates (Frequency Changes)

Challenge: 40B requests/day generate millions of updates

Solution: Batch Updates

Real-time Flow:

User searches "facebook" →
Log to Kafka (async, non-blocking)
Return cached suggestions immediately
Don't update Trie in real-time

Batch Update Flow (Every 10 minutes):

Kafka Consumer reads last 10 min of queries
Aggregate: {"facebook": +15000, "face masks": +8000}
Update Cassandra (frequency table)
Background job rebuilds Trie sections
Swap old Trie with new Trie (atomic pointer swap)

Trie Update Strategy:

Option 1: Full Rebuild (Daily)
- Build new Trie from scratch
- Takes 2-3 hours
- Swap pointers at 3 AM

Option 2: Incremental Update (Every 10 min)
- Update only changed nodes
- Faster but complex
- Risk of inconsistencies

Trade-off: Slight staleness (10 min) for performance

4.5 Data Storage

Cassandra Schema (Query Frequency Table):

CREATE TABLE query_stats (
    prefix TEXT,              -- First 3 chars: "fac"
    full_query TEXT,          -- Complete query: "facebook"
    frequency COUNTER,        -- Search count
    last_updated TIMESTAMP,
    PRIMARY KEY ((prefix), frequency, full_query)
) WITH CLUSTERING ORDER BY (frequency DESC);

-- Query for prefix "fac":
SELECT full_query, frequency
FROM query_stats
WHERE prefix = 'fac'
ORDER BY frequency DESC
LIMIT 10;

Why Cassandra?

Write-heavy: 463K writes/sec
Horizontal scaling
Counter data type for atomic increments
Time-series friendly

Redis Cache (Hot Prefixes):

Key: "autocomplete:fac"
Value: [
  {"text": "facebook", "score": 15000000},
  {"text": "face masks", "score": 5200000},
  ...
]
TTL: 10 minutes

Cache Strategy:

Top 10K most popular prefixes cached
Hit rate: ~40% (power law distribution)
Eviction: LRU policy

4.6 Caching Strategy

Multi-layer Cache:

Layer 1: Client-side Cache (Browser)
- Cache: Last 50 queries
- Hit rate: ~20%
- TTL: Session duration

Layer 2: CDN/Edge Cache
- Cache: Popular prefixes (top 10K)
- Hit rate: ~40%
- TTL: 5 minutes

Layer 3: Redis Cluster
- Cache: All prefixes from last hour
- Hit rate: ~80%
- TTL: 10 minutes

Layer 4: Trie In-Memory
- Pre-computed top suggestions per node
- Hit rate: 100% (always in memory)

Fallback: Cassandra Database

Cache Invalidation:

Time-based: TTL of 10 minutes
Event-based: Breaking news triggers cache clear
Lazy update: Rebuild on next request

4.7 Fuzzy Matching (Typo Handling)

Challenge: User types "facbook" instead of "facebook"

Solutions:

1. Edit Distance (Levenshtein Distance):

EditDistance("facbook", "facebook") = 1 (insert 'e')

For each candidate in Trie:
  if edit_distance(user_input, candidate) <= 2:
    include in suggestions

Trade-off: Expensive to compute for all candidates

2. Phonetic Algorithms (Soundex/Metaphone):

Soundex("facebook") = "F212"
Soundex("facbook") = "F212"  // Same code!

Store phonetic codes in Trie

3. Bigram/Trigram Matching:

"facbook" → bigrams: [fa, ac, cb, bo, oo, ok]
"facebook" → bigrams: [fa, ac, ce, eb, bo, oo, ok]

Similarity = (common bigrams) / (total unique bigrams)
          = 5 / 9 = 0.56

Implementation:

Build inverted index: bigram → queries
At query time, find queries with similar bigrams

Chosen Approach: Hybrid

Use Trie for exact prefix
If < 3 results, fallback to fuzzy matching (edit distance ≤ 2)

4.8 Personalization

User History Tracking:

Redis Key: "user:12345:history"
Value: [
  {"query": "python tutorial", "clicked": true, "ts": 1696512000},
  {"query": "java vs python", "clicked": false, "ts": 1696511000}
]

Personalization Logic:

For each suggestion:
  if user searched this before:
    boost score by 1.5x

  if user clicked this before:
    boost score by 2.0x

Privacy:

Store only last 100 queries per user
Hash user IDs
TTL: 30 days
Opt-out option

4.9 Trending/Real-time Queries

Challenge: Breaking news "earthquake california" should appear immediately

Solution: Trending Detection

Algorithm:

1. Count queries in sliding windows:
   - Last 5 minutes
   - Last 1 hour
   - Last 24 hours

2. Calculate spike ratio:
   spike = count_5min / avg_count_hourly

3. If spike > 10x:
   Mark as trending
   Boost score by 2x
   Inject into hot cache

Example:
"earthquake california"
- Last hour: 500 searches
- Last 5 min: 8000 searches
- Spike = 8000 / (500/12) = 192x
- → TRENDING! Boost to top of suggestions

Implementation:

Kafka Stream Processing:
Consume query logs
Windowed aggregation (5 min windows)
Compare to baseline
Push trending queries to Redis
Update Trie with high priority

4.10 Handling Special Cases

Case 1: Long Tail Queries

Problem: 80% of queries are unique (searched once)
Solution: Don't store queries with frequency < 5
Trade-off: Miss rare queries, save 80% storage

Case 2: Offensive/Adult Content

Solution: Blacklist filter
Maintain list of banned words/phrases
Filter at ingestion time (Kafka consumer)
Never show in suggestions

Case 3: Multi-word Queries

Query: "how to learn python"

Approach 1: Whole phrase matching
- Trie key: "how to learn python"
- Works but inflexible

Approach 2: Last word matching
- When user types "how to learn p"
- Match prefix "p" in context of "how to learn"
- More flexible, handles variations

Case 4: Numbers/Special Characters

Query: "iphone 15 pro"

Trie structure:
  i → p → h → o → n → e → SPACE → 1 → 5 → SPACE → p → r → o

Store spaces and numbers as nodes
Normalize queries: lowercase + trim

4.11 Geographic Personalization

Challenge: "football" means different things in US vs UK

Solution: Geo-based Tries

US Region:
  "football" → [
    "football schedule",
    "NFL football scores",
    "fantasy football"
  ]

UK Region:
  "football" → [
    "football premier league",
    "football transfers",
    "football world cup"
  ]

Implementation:

Maintain separate Tries per region
Route based on user's IP or explicit location
Fallback to global Trie if region-specific unavailable

5. Scalability & Performance Optimizations

5.1 Read Optimization

Pre-computation:

Store top 10 suggestions at each Trie node
Avoids traversing entire subtree at query time

Connection Pooling:

Persistent connections to Redis/Cassandra
Reduce connection overhead

Compression:

Compress Trie in memory (prefix compression)
Save 30-40% memory

5.2 Write Optimization

Async Logging:

Non-blocking writes to Kafka
Don't wait for acknowledgment

Batch Writes:

Aggregate 10 min of queries
Single batch update to Cassandra

Append-only Logs:

Never update Trie in-place
Build new version, swap atomically

5.3 Network Optimization

Protocol Buffers:

Use Protobuf instead of JSON
50% smaller payload

HTTP/2:

Multiplexing
Header compression

Compression:

Gzip response bodies
70% bandwidth savings

6. Monitoring & Metrics

Key Metrics:

Performance:

P50, P95, P99 latency (target: < 100ms)
QPS per server
Cache hit rates (CDN, Redis, Trie)

Quality:

Suggestion relevance (CTR - click-through rate)
Empty result rate (< 5%)
Typo correction rate

System Health:

Trie rebuild time
Kafka consumer lag (< 10 sec)
Redis memory usage
Cassandra write throughput

Alerts:

Alert: High Latency
Condition: P99 > 200ms for 5 minutes
Action: Auto-scale Trie servers

Alert: Low Cache Hit Rate
Condition: Redis hit rate < 70%
Action: Increase cache size or TTL

Alert: Trie Out of Sync
Condition: Trie age > 20 minutes
Action: Trigger emergency rebuild

7. Trade-offs & Considerations

Decision	Pros	Cons
Trie in-memory	Fast lookups (O(p))	Limited by RAM, expensive
Batch updates	High write throughput	10-min staleness
Pre-computed suggestions	Ultra-fast response	Increased storage
Partitioning by letter	Horizontal scaling	Uneven distribution (Z has fewer queries)
Redis caching	Reduces Trie load	Cache invalidation complexity
Fuzzy matching	Better UX for typos	Increased latency

8. Summary

Key Design Decisions

Trie Data Structure: O(p) prefix matching with pre-computed top-K
Horizontal Partitioning: Scale by first letter(s)
Multi-layer Caching: Client → CDN → Redis → Trie → Cassandra
Batch Updates: Rebuild Trie every 10 min, not real-time
Async Logging: Non-blocking query logs via Kafka
Personalization: Boost user's previous queries
Trending Detection: Real-time spike detection for breaking news

Scalability Achieved

✅ 1.4M QPS (peak traffic)
✅ < 100ms P99 latency
✅ 5B unique queries supported
✅ 99.99% availability
✅ 40% CDN cache hit rate
✅ Real-time trending (< 5 min delay)

9. Alternative Approaches

Approach 1: Full Database Lookup (No Trie)

SELECT query, frequency
FROM queries
WHERE query LIKE 'fac%'
ORDER BY frequency DESC
LIMIT 10;

Pros: Simple, no Trie complexity
Cons: Too slow (> 500ms), can't scale

Approach 2: Inverted Index (Elasticsearch)

Index:
{
  "fac": ["facebook", "face masks", "factory"],
  "ace": ["facebook", "face masks", "race car"]
}

Pros: Flexible, supports fuzzy matching
Cons: Slower than Trie (100-200ms), higher memory

Approach 3: ML Model (Neural Networks)

Train model to predict next characters
Pros: Learns context, handles typos naturally
Cons: High latency (300ms+), expensive training

Chosen: Trie + Redis (best latency/complexity trade-off)

10. Advanced Features

Voice Search Support

Convert speech to text (ASR)
Feed to autocomplete with confidence scores
Boost high-confidence suggestions

Image-based Suggestions

OCR on images
Extract text → feed to autocomplete
Show visual suggestions (thumbnails)

Context-aware Suggestions

Previous query: "python"
Current input: "tut"
Suggestions: ["tutorial", "tutorials for beginners"]
  (biased toward Python tutorials)

A/B Testing Framework

Test ranking algorithms
Measure CTR, conversion
Roll out winner to 100%

11. Failure Scenarios & Recovery

Scenario 1: Trie Server Down

Detection: Health check fails
Recovery: Route to replica
Time: < 1 second (automatic failover)

Scenario 2: Redis Cluster Failure

Detection: All Redis nodes unreachable
Recovery: Fallback to Trie (slower but works)
Time: Degraded performance, no downtime

Scenario 3: Cassandra Outage

Impact: Can't update frequencies
Recovery: Serve stale data from Redis/Trie
Time: No user-facing impact (batch updates delayed)

Scenario 4: Kafka Lag

Detection: Consumer lag > 1 min
Recovery: Scale up consumers, replay messages
Time: 5-10 min to catch up

12. Interview Tips

What interviewers look for:

✅ Understand requirements: Low latency is critical
✅ Choose right data structure: Trie vs database vs inverted index
✅ Handle scale: 1M+ QPS, 5B queries
✅ Discuss trade-offs: Real-time updates vs batch
✅ Consider edge cases: Typos, trending, offensive content
✅ Monitoring: How to detect/fix issues

Common follow-ups:

"How would you handle typos?" → Edit distance, phonetic codes
"What if Trie doesn't fit in memory?" → Partitioning, compression
"How to make suggestions personalized?" → User history, ML
"How to detect trending queries?" → Spike detection algorithm

Red flags to avoid:

❌ Saying "use database LIKE query" (too slow)
❌ Ignoring latency requirements
❌ Not discussing caching
❌ Forgetting about updates (how to refresh Trie)

Table of Contents​

1. Requirements (~5 minutes)​

Functional Requirements​

Non-functional Requirements​

Out of Scope​

Capacity Estimation​

2. API Interface (~2 minutes)​

REST API​

Query Parameters​

3. High-Level Design (~10-15 minutes)​

Architecture Components​

4. Architecture Diagram​

4. Deep Dives (~10 minutes)​

4.1 Trie Data Structure (Core Algorithm)​

4.2 Ranking Algorithm​

4.3 Trie Partitioning (Horizontal Scaling)​

4.4 Handling Updates (Frequency Changes)​

4.5 Data Storage​

4.6 Caching Strategy​

4.7 Fuzzy Matching (Typo Handling)​

4.8 Personalization​

4.9 Trending/Real-time Queries​

4.10 Handling Special Cases​

4.11 Geographic Personalization​

5. Scalability & Performance Optimizations​

5.1 Read Optimization​

5.2 Write Optimization​

5.3 Network Optimization​

6. Monitoring & Metrics​

7. Trade-offs & Considerations​

8. Summary​

Key Design Decisions​

Scalability Achieved​

9. Alternative Approaches​

Approach 1: Full Database Lookup (No Trie)​

Approach 2: Inverted Index (Elasticsearch)​

Approach 3: ML Model (Neural Networks)​

10. Advanced Features​

Voice Search Support​

Image-based Suggestions​

Context-aware Suggestions​

A/B Testing Framework​

11. Failure Scenarios & Recovery​

Scenario 1: Trie Server Down​

Scenario 2: Redis Cluster Failure​

Scenario 3: Cassandra Outage​

Scenario 4: Kafka Lag​

12. Interview Tips​

Table of Contents

1. Requirements (~5 minutes)

Functional Requirements

Non-functional Requirements

Out of Scope

Capacity Estimation

2. API Interface (~2 minutes)

REST API

Query Parameters

3. High-Level Design (~10-15 minutes)

Architecture Components

4. Architecture Diagram

4. Deep Dives (~10 minutes)

4.1 Trie Data Structure (Core Algorithm)

4.2 Ranking Algorithm

4.3 Trie Partitioning (Horizontal Scaling)

4.4 Handling Updates (Frequency Changes)

4.5 Data Storage

4.6 Caching Strategy

4.7 Fuzzy Matching (Typo Handling)

4.8 Personalization

4.9 Trending/Real-time Queries

4.10 Handling Special Cases

4.11 Geographic Personalization

5. Scalability & Performance Optimizations

5.1 Read Optimization

5.2 Write Optimization

5.3 Network Optimization

6. Monitoring & Metrics

7. Trade-offs & Considerations

8. Summary

Key Design Decisions

Scalability Achieved

9. Alternative Approaches

Approach 1: Full Database Lookup (No Trie)

Approach 2: Inverted Index (Elasticsearch)

Approach 3: ML Model (Neural Networks)

10. Advanced Features

Voice Search Support

Image-based Suggestions

Context-aware Suggestions

A/B Testing Framework

11. Failure Scenarios & Recovery

Scenario 1: Trie Server Down

Scenario 2: Redis Cluster Failure

Scenario 3: Cassandra Outage

Scenario 4: Kafka Lag

12. Interview Tips